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Abstract 


Estimation procedures based on recursive algorithms are interesting and powerful 
techniques that are able to deal rapidly with (very) large samples of high dimensional 
data. The collected data may be contaminated by noise so that robust location indi¬ 
cators, such as the geometric median, may be preferred to the mean. In this context, 
an estimator of the geometric median based on a fast and efficient averaged non lin¬ 
ear stochastic gradient algorithm has been developed by 


Cardot et al 


1:2013). This work 


aims at studying more precisely the non asymptotic behavior of this algorithm by giving 
non asymptotic confidence balls. This new result is based on the derivation of improved 
L 2 rates of convergence as well as an exponential inequality for the martingale terms of 
the recursive non linear Robbins-Monro algorithm. 


Keywords : Functional Data Analysis, Martingales in Hilbert space. Recursive Estimation, 
Robust Statistics, Spatial Median, Stochastic Gradient Algorithms. 


1 Introduction 


Dealing with large samples of observations taking values in high dimensional spaces such 
as functional spaces is not unusual nowadays. In this context, simple estimators of location 
such as the arithmetic mean can be greatly influenced by a small number of outlying values. 
Thus, robust indicators of location may be preferred to the mean. We focus in this work on 
the estimation of the geometric median, also called L 1 -median or spatial median. It is a 
generalization of the real median introduced by Haldane (1948) that can now be computed 
rapidly, even for l arge samples in high dimension spaces, thanks to recursive algorithms 


Cardot et al 


(201 m 


(see 

Let H be a separable Hilbert space, we denote by (.,.) its inner product and by ||.|| the 
associated norm. Let X be a random variable taking values in H, the geometric median m 
of X is defined by: 

m := argminE [||X — h\\ — ||X||] . (1) 
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Many properties of this median in separable Banach spaces are given by IKemperman 


Small 


19871 such as existence and uniqueness, as well as robustness (see also the review 
1990 11. Recently, this median has received much attention in the literature. For example, 
Minsker (2Q14) suggests to consider, in various statistical contexts, the geometric median 
of independent estimators to obtain much tighter concentration bounds. In functional data 
analysis, Kraus and Panaretos ( 2012 ) consider resistant estimators of the covariance oper¬ 
ators based on the geometric median in order to derive a robust test of equality of the 
second-order structure for two samples. The geometric median is also chosen to be the cen¬ 
tral location indicator in y a rious types of robus t functional prin cipal components analyses 
(see 


Locantore et al 


199911 , Gervini (2 00811 and 


Bali et a 


(2011)). Finally, a general defini 


Arnaudon et al. 


( 2012 ) with signal 


tion of the geometric median on manifolds is given in 
processing issues in mind. 

Consider a sequence of i.i.d copies X\, X 2 , ■ ■ ■, X,„... of X. A natural estimator m n of m, 
based on X \,..., X n , is obtained by minimizing the empirical risk 


m n : = argmin ^ [||X; — h\\ - ||X f | 


( 2 ) 


Convergence properties of the empirical estimator m n are reviewed i nlMottonen et al.l ( 2010 ) 
when the dimension of H is finite whereas the recent work of IChakrabortv and Chaudhuri 
2014) proposes a deep asymptotic study for random variables taking values in separable 
Banach spaces. 

Given a sample Xi,..., X n , the computa tion of m n gene rally relies on a yariant of the 
Weiszfeld's algorithm (see e.g. Kuhn ( 1973)) introduced b y Vardi and Zhang ( 2000 ). This 
iterative algorithm is relatively fast (see Beck and Sabach (2014) for an improved version) 
but it is not adapted to handle very large data sets of high-dimensional data since it requires 
to store all the data. However huge datasets are not unu sual anymore with the development 


of automatic sensors and smart meters. In this context 


Cardot et al 


2013|) have developed 


a much faster recursive algorithm, which does not require to store all the data and can be 
updated automatically when the data arrive online. The estimation procedure is based on 
the simple following recursive scheme. 


■Zjj+l — Z n + 7 n‘j 


Xfi+i Z n 
Xfi+i Z n 


(3) 


where the sequence of steps ( 7 ,,) controls the convergence of the algorithm and satisfy the 
usual conditions for the convergence of Robbins Monro algorithms (see Section [3}. The 
averaged version of the algorithm is given by 


Zjt+l — Z n + (Zrc+l Z n ) 

n + 1 x ' 


(4) 


with Zn = 0, so that_Z„ = 2 Ya=i Zi■ The averaging step described in Q, and first studied 


m 


Polyak and J udi ts k v (119921) , allows a considerable improvement of the convergence of 
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the initial Robbins-Monro algorithm. It is shown in 


Cardot et al 


(|2013il that the recursive 


averaged estimator Z n and the empirical estimator ffi n have the same Gaussian limiting 
distribution. In infinite dimensional spaces, this nice result heavily relies on the (locally) 
strong convex properties of the objective function to be minimized. Note that Bach (2014) 
adopts an analogous recursive point of view for logistic regression under slightly different 
conditions, called self-concordance, which involve uniform conditions on the third order 
derivatives of the objective function. 

The aim of this work is to give new arguments in favor of the averaged stochastic gra¬ 
dient algorithm by providing a sharp control of its deviations around the true median, 
for finite samples. To get such non asymptotic confidence balls, new results about the be¬ 
havior of the stochastic algorithm are proved : improved convergence rates in quadratic 
mean compared to those obtained in 


Cardot et al. 


2013) as well as new exponential in¬ 
equalities for "near" martingale sequences in Hilbert spaces, similar to the seminal result of 


Pinelis 1994) for martingales. Note that, as far as we know, there are only very few results 


in the literature on exponential bounds for non linear recursive algorithms (see however 


Balsubramani e t al. 


(2013) for recursive PC A). 


The paper is organized as follows. Section [2] recalls some convexity properties of the 
geometric median as well as the basic assumptions ensuring the uniqueness of the geomet¬ 
ric median. In Section [3j the rates of convergence of the stochastic gradient algorithm are 
derived in quadratic mean as well as in L 4 . In Section |4j an exponential inequality is de¬ 


rived borrowing ideas from 


Tarres and Yao 


(2014). It enables us to build non asymptotic 


confidence balls for the Robbins-Monro algorithm as well as its averaged version. All the 
proofs are gathered in Section [5] 


2 Assumptions on the median and convexity properties 

Let us first state basic assumptions on the median. 

(Al) The random variable X is not concentrated on a straight line: for all h £ H, there exists 
h' £ H such that (h, h') =0 and 


Var((ft',X)) > 0. 


(A2) X is not concentrated around single points: there is a constant C > 0 such that for all 
h £ H: 


E 


\x-h\\- 


< c. 


Assumption (Al) ensures that the median m is uniquely defined (Kemperman, 1987). As¬ 
sumption (A2) is closely related to small ball probabilities and to the dimension of H. It was 
proved in Chaudhuri (1992) that when H = R rf , assumption (A2) is satisfied when d > 2 
under classical assumptions on the density of X. A detailed discussion on as sumption (A2) 
and its connection with small balls probabilities can be found in 


Cardot et al. 


(1201311 . 
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We now recall some results about convexity and robustness of the geometric median. 
We denote by G : H —> IR the convex function we would like to minimize, defined for all 
h E H by 

G(h) := E [||X — h\\ — ||X||]. (5) 


This function is Frechet differentiable on H, we denote by its Frechet derivative, and for 
all heH: 


®{h) := V;,G = -E 


' X-h ' 


Under previous assumptions, m is the unique zero of T>. 

Let us define lf„ + i := — || x”^-z”| | an d us introduce the sequence of u-algebra T n := 
a (Zi, ...,Z M ) = a (Xi,..., X n ). For all integer n > 1, 


E[U n+1 \T n ] =0(Z„). 


( 6 ) 


The sequence (£ n )„ defined by £„ + i := <E>(Z n ) — U n+ 1 is a martingale difference sequence 
with respect to the filtration {Tn). Moreover, we have for all n, \\^ n +i || < 2 and 

E[||^+i|| 2 |J-„] < 1 - ||o(z„)ll 2 < 1- (7) 

Algorithm <J3]) can be written as a Robbins-Monro or a stochastic gradient algorithm: 

Zjj+i tn = Zn tn T«d ) (Z n ) -F (8) 


We now consider the Hessian of G, which is denoted by T;, : H 
Gervini ( 200811 1 

(X - h) <g> (X - h) 


H. It satisfies (see 


T;, = E 


Ih~ 


\X-h\\ || X-h\\* 

where Ih is the identity operator in H and it <S> v(h) = {u,h)v for all u, v,h € H. The follow¬ 
ing (local) strong convexity properties will be useful (see Cardot et al. (2013) for proofs). 


Cardot et a! 


( 201311 ). Under assumptions (Al) and (A2),for any real number 


Proposition 2.1 ( 

A > 0, there is a positive constant Ca such that for all h G H with \\h\\ < A, and for all h' G H: 

c A \\h'\\ 2 < {ti,T h ti) < C\\h'\\ 2 . 

As a particular case, there is a positive constant c m such that for all h' G H: 

c,„\\h'\\ 2 < (h',T m ti) < CII^H 2 . 


(9) 


The following corollary recall some properties of the spectrum of the Hessian of G, in 
particular on the spectrum of T m . 

Corollary 2.1. Under assumptions (Al) and (A2),for all h G H, there is an increasing sequence of 
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non-negative eigenvalues (A y/,) and an orthonormal basis ( Vjj t ) of eigenvectors of I /, such that 


cr(T h ) = {x hh ,j eN}, 

hj,h < c. 


Moreover, if \\h\\ < A, for all j G N we /raue Ca < Ay,/, < C. 

As a particular case, the eigenvalues Ay, m o/T„, satisfy, c m < Ay m < C,/or all j G N. 


The bounds are an immediate consequence of Proposition 12. II Remark that with these 
different convexity properties of the geometric median, we are close to the framework of 


Ba ch (2014). The difference comes from the fact that G does not satisfy the generalized 


self-concordance assumption which is central in the latter work. 


3 Rates of convergence of the Robbins-Monro algorithms 

If the sequence ( 7 n ) n of stepsizes fulfills the classical following assumptions: 


n> 1 


00 


and 


E 7 * = 

n> 1 


00 , 


the recursive estimator Z„ is strongly consistent (see Cardot et al. (2013)). The first condition 
on the stepsizes ensures that the recursive algorithm converges towards some value in H 
whereas the second condition forces the algorithm to converge to m, the unique minimizer 
of G. 

From now on, Zi is chosen so that it is bounded (consider for example Zi = Xilj yxy < M q 
for some non negative constant M'). Consequently, there is a positive constant M such that 
for all n > 1 : 

E [||Z„ — m|| 2 ] < M. 


Let us consider now sequences ( 7 ,,) n of the form 7 ,, = c 7 n _a where c 7 is a positive 
constant, and oc G (1/2,1). In order to get confidence balls for the median, the following 
additional assumption is supposed to hold. 


(A3) There is a positive constant C such that for all h G H: 

E[||X-/i||“ 2 ] < C. 


This assumption ensures that the remainder term in the Taylor approximation to the gradi- 


m 


Cardot et al. 

(2013 

). It is also assumed in 

Chakrabortv and Chaudhuri 

toi4) 


ing the asymptotic normality of the empirical median estimator. Remark that for the sake of 
simplicity, we have considered the same constant C in (A2) and (A3). As in (A2), Assump¬ 
tion (A3) is closely related to small ball probabilities and when H = R rf , this assumption is 
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satisfied when d > 3 under weak conditions. 

We state now the first new and important result on the rates of convergence in quadratic 
mean of the Robbins Monro algorithm. A comparison with Proposition 3.2 in 


Cardot et al. 


2013) reveals that the logarithmic term has disappeared as well as the constant Qv that was 
related to a sequence (Ojv)n of events whose probability was tending to one. 


Theorem 3.1. Assuming (A1)-(A3) hold, the algorithm (Z„) defined by (HP, with = c 7 n^ K , 
converges in quadratic mean, for all a G (1/2,1) and for all a. < f> < 3a. — 1, with the following 
rate: 


E[\\Z n 




E 


Z n — m 




( 10 ) 

( 11 ) 


Upper bounds for the rates of convergence at order four are also given because they will 
be useful in several proofs. Remark that obtaining better rates of convergence at the order 
four would also be possible at the expense of longer proofs, but it is not necessary here. The 
proof of this theorem relies on two technical lemmas. The following one gives an upper 
bound of the quadratic mean error. 


Lemma 3.1. Assuming (A1)-(A3) hold, there are positive constants Ci, C 2 , C 3 , C 4 such that for all 


n > 1: 



< C\e~ Cinl “ + — + C 3 sup E 

n/2—l<k<n 



( 12 ) 


The proof of Lemma |3.1 l is given in Section[5] 


Lemma 3.2. Assuming the three assumptions (Al) to (A3), for all a. G (1/2,1), there are a rank 
and positive constants C' v C 2 such that for all n > n a : 


E 


|Z„+i - m | 


< 1 - 


E 


I Z n - ml 


C' 

vi 3 oc 


+ C 2 —ttE 

^n lK 


I Z„ - ml 


(13) 


The proof of Lemma 13.21 is given in Section |5] The next result gives the exact rate of 
convergence in quadratic mean and states that it is not possible to get the parametric rates 
of convergence with the Robbins Monro algorithm when a. G (1/2,1). 

Proposition 3.1. Assume (A1)-(A3) hold, for all a. G (1/2,1), there is a positive constant C' such 
that for all n > 1, 


E 


\Z n ~ ml 


>U 

n a 
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4 Non asymptotic confidence balls 


4.1 Non asymptotic confidence balls for the Robbins-Monro algorithm 


The aim is now to derive an upper bound for P [\\Z n — m\\ > t], for t > 0. A simple first 
result can be obtained by applying Markov's inequality and Theorem 13.11 We give below 
a sharper bound that relies on exponential inequalities that are close to the ones given in 


Theorem 3.1 in Pinelis 1994;*). As explained in Remark 14.21 be 1 ow, it was not possible to apply 
directly Theorem 3.1 of Pinelis 199411 and the following proposition gives an analogous 
exponential inequality in the case where we do not have exactly a sequence of martingale 
differences. 


Proposition 4.1. Let (fi n ,k) u „) e ]NxN ^ e a sequence of linear operators on H and (£„) be a sequence 
ofH-valued martingale differences adapted to a filtration (J r n )- Moreover, let (j n ) be a sequence of 
positive real numbers. Then, for all r > 0 and for all n > 1, 


P 


n —1 

F. lfkfin—l,k^k+l 
k =1 


> r 


< 2e~ 


fl (l + E _ i _ \\f n _ lrj _ llhl ^\\ | \T hl ]) 



;=2 




< 2 exp 

(~r + 

n 

Ee 

fWPn-V-lJj-lSiW _ l _ || j S n _ 1< ._ l7 ._ 1 ^.|| 

\Fh\ 


V 

i =2 




The proof of Proposition 14.II is postponed to Section |5j As in Tarres and Yao (2014), it 


enables to give a sharp upper bound for P fin-l,kTkfk+i 


> t 


Corollary 4.1. Let ( f n ,k ) be sequence of linear operators on H, (£„) be a sequence of H-valued 
martingale differences adapted to a filtration (J 7 ,,) and fiy n ) be a sequence of positive real numbers. 
Let ( N n ) and (n^) be two deterministic sequences such that 


n —1 


Nn > Sup ||j8 n _ U 7jtft+i|| a.s. and of > E [\\fi n -i,k7k^k+i || | Fn] ■ 

k<n —1 Jc=l 


For all t > 0 and all n > 1, 


P 


n —1 

r. fin- 1 ,kTkfk+l 

k=l 


> t 


< 2 exp — 


2((7-2 + fN„/3) 


In order to apply these results, let us linearize the gradient around m in decomposi¬ 
tion (O, 


Zjj+i m — Z n m j n T m (Z n m ) T- Tndn; 


(14) 
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where 5 n := 0(2,,) — T m (Z n — m) and introduce, for all n > 1, the following operators: 

M-n ■ = Ih 'Ifn^'mr 
n n 

thi ■= n = n ( j h - 7k^k), 

k= 1 fc=l 

ySo := Iff- 

By induction, (fl4l) yields 

Z n — m = Pn- l(Zl - m) + Pn-lMn - Pn-lRn, (15) 


with 

Rn ■= E 7kPk l6 k ' 

k= 1 
w —1 

Mi := X] 7kfi k l f,k+i- 

k= 1 


Remark 4.1. Note f/zaf rue zzzzz/ce an Abuse of notation because f k 1 does not necessarily exist. How¬ 
ever, if f/ze Zz'zzeaz' operator ff 1 is bounded. Moreover, we can make this abuse because, even 

if p k h as n °t a continuous diverse, we only need to consider := TVj= k +i ~ 

which are continuous operators for k < n — 1. 


Note that, if f k is invertible for all k > 1, (M n ) is a martingale sequence adapted to the 
filtration ifF n ). Moreover, 


F [\\Z n — m\\ > t] < F 
< F 


\\pn-lMn\\ > 


+ F 


11 / 1)7 — 1-^17 || ^ 


+ F 


||/3 fJ _ 1 (Z 1 -zzz)|| > 


+ 4 E|MJ + 16 E Wn-i(Zi - m)) II 2 ] 


t 


t 2 


(16) 


In this context. Corollary 14.11 can be written as follows: 

Corollary 4.2. Let (N n ) n>1 and (vf) n>1 be tzvo deterministic sequences such that 


N n > sup p n -ip k 1 7k£k+i 

k<n—l 

Then, for all t > 0 and for all n > 1, 


n —1 

a.s. and of > E E 
k=l L 


|^n 


F 


77 — 1 

Pn-lP k l Tk^,k+\ 

k=l 


> t 


< 2 exp — 


2( cr n + tN n /3) 

We can now derive non asymptotic confidence balls for the Robbins Monro algorithm. 
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Theorem 4.1. Assume that (A1)-(A3) hold. There is a positive constant C such that for all S E 
(0,1), there is a rank ns such that for all n > ns, 


F 




> 1 - 3 . 


Remark 4.2. Note that we coidd not apply Theorem 3.1 in Pi nelis {1991 [) to the martingale term 
M„ = YlkZi fik^Tk^k+i- In fact, two problems are encountered. First, as written in Remark WTJ] ff 1 
does not necessarily exist. The second problem is that although there is a positive constant M such 
that \\f> n -iM„\\ < Mfor all n > 1, the sequence ||/3„_i|| ||M„|| may not be convergent (||jS n _i|| 
denotes the usual spectral norm of operator f> n -\). 


4.2 Non asymptotic confidence balls for the averaged algorithm: 


Cardot et a! 


20131) and Pelletier ( 20001) . we make use of decomposition (fT4l) . By sum- 


As in 

ming and applying Abel's transform, we get: 


r„T„ = i (A-Ati + £r t 

n \ 71 In f 2 


— Xj ^ ) H—M„+1, 
17k Ik+il ) n 


(17) 


with 


T n ■— Z n m, 


T n := Z n - m 

n 

^n+i : = ^+i- 

k =1 


The last term is the martingale term. Applying Pinelis-Bernstein's Lemma (see Tarres and Yao 
2014), Appendix A) to this term and showing that the other ones are negligible, we get the 
following non asymptotic confidence balls. 

Theorem 4.2. Assume that (A1)-(A3) hold. For all 3 E (0,1), there is a rank ns such that for all 
n > n s , 


M z„- m )||<4( Y + -LW4 


P 

Since the smallest eigenvalue Amin of T m is strictly positive, 

4 


>1-3. 


P 


Z„ — m\\ < 


2_ ^ / 4 

Amin V 3 n \fn ) V 3 


>1-3. 


Remark 4.3. We can also have a more precise form of the rank ns (see the Proof of Theorem 142D 

6 c; 


1 / 2 — 01/2 


ns := max 


6 C' \ x - 1/2 


6 a. 


/ ln (f) 


y<5in (^) 


(18) 
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where C' v C' 2 and C' 3 are constants. We can remark that the first two terms are the leading ones and 
if the rate tx is chosen equal to'll?), they are of the same order that is ns = O (yj^y) . 


Remark 4.4. We ca n make an informal comparison of previous residt with the central limit theorem 
stated in (Cardot et al. (2013), Theorem 3.4), even if the latter residt is of asymptotic nature. Under 
assumptions (A1)-(A3), they have shown that 


VTi (z n - rn) (o,r m 1 zr m 1 '), 

iwoo \ / 


with, 


E = E 


(X — m) (X — m) 


|| X — m\\ || X — m|| ' 

This implies, with the continuity of the norm in H, that for all t > 0, 


Hm P [|| {Z n - m) || > f] = P [||^|| > t\, 

where V is a centered H-valued Gaussian random vector with covariance operator Ay = T“ 1 Er m 1 . 
Operator Ay is self-adjoint and non negative, so that it admits a spectral decomposition Ay = 
Yjj >i t]jVj <S> Vj, where rj\ > q 2 > .... > 0 is the sequence of ordered eigenvalues associated to the 
orthonormal eigenvectors V\, Vi,... Using the Karhunen-Loeve expansion ofV, we directly get that 

\\vf = LvM 

i> 1 

where V\, Vi,... are i.i.d. centered Gaussian variables with unit variance. Thus the distribution of 
11^)1 is a mixture of independent Chi-square random variables with one degree of freedom. Com¬ 
puting the quantiles of || V|| to biuld confidence balls would require to know, or to estimate, all the 
(leading) eigenvalues of the rather complicated operator Ay and this is not such an easy task. 

On the other hand, the use of the confidence balls given in Theorem \422\ only requires the knozvl- 
edge of A m i n . This eigenvalue is not difficult to estimate since it can also be written as 


3 (X — m) <g> (X — m) 

where A max (A) denotes the largest eigenvalue of operator A. 

Remark 4.5. Under previous assumptions and the additional condition a > 2/3, it can be shown 
with decomposition (fTTD that there is a positive constant C such that 

E[||Z„-HI 2 ] < —■ 

L J n 

The averaged algorithm converges at the parametric rate of convergence in quadratic mean. 



Amin — E 


||X-m||_ 


-An 


E 


IX-ml 
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5 Proofs 


5.1 Proofs of the results given in Section [3] 

In order to prove Lemma 13.11 we have to introduce a technical lemma which controls the 
remainder term \\S n || (see eq. HU appearing in the Taylor approximation. This will enable 
us to bound the term /3„_ \ R n in decomposition (fl5l) . 


Lemma 5.1. Assuming assumption (A3), there is a constant C m such that for all n > 1, almost 
surely: 

\\Sn\\<C m \\Z n -m\\ 2 , (19) 

where S n := 0(Z„) — T m (Z n — m). 

Proof of Lemma I5TT1 Using Taylor's theorem with remainder of integral form, almost surely 

&(Z n ) = T m +t(z„_m)(Zn — ni)dt, (20) 


and 


fin — J [J' m+t(Z n —m) fmj (Z n m)dt. 
For all h, h 1 £ H, we denote by cp^j ,/ the function defined as follows: 


<Ph,h! : [0/1] 


H 


*Ph,h’(f ) • )• 


Let U; ; : [0,1] —> IR + and : [0,1] —> H be two random functions defined for all 
t £ [0,1] by 


Uh ® : \\X-m-th\\’ 

_ , {X-m- th, h’) (X - m - th ) 

V hji'\t) ■— 11 -[iT?-—-77 jT2- 


LetV'^f) = ji v h,h '(0 = 


v h,h'(t+t')-v h ,h'(.t) 


and U' h (t) = f t U h (t ) = lim f /_> 0 


yO+Q-yt 0 


Let cp' k 

ht (0 = | i(ph,h'(t)/ by dominated convergence, cpuje is differentiable on [0,1] and 
IkU'WII ^ E [l^OIII^'WII + 1^(0111^^(0111- UsingCauchy-Schwarz inequality. 


\\X-m-th\Y 

- wxl-mr 

\\Vhji'(t)\\ < 2\\h'\\, 

m\\\h'\\ 


\vum < 


\\X-m-th\\ 
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Finally, using assumption (A3), 


\<Ph,h'(t)\\ < 6 ||fe||||fc'||E 
<6\\h\\\\h'\\C. 


||X — m — th || 2 


Using previous inequalities, we obtain that for all h £ H 

\\<S>(m+h)-T m (h)\\ < C \\T m+th (h)-T m (h)\\dt 

Jo 


< / I \<Pu,h(t) - (ph,h(0)\\dt 

Jo 

< [ sup \\(p' Kh (t')\\dt 
Jo f'e[o,f] 


Taking h = Z n — m, for all n > 1: 

with C m = 6 C. 

We can now prove Lemma [Til 


=[o,t] 

< 6C\\h\\ 2 


ll^nll < Cm\\Z n - m || 2 , 


□ 


Proof of Lemma l3J\ We need to study the asymptotic behaviour of the sequence of opera¬ 
tors (f n )n- Since T m admits a spectral decomposition, we have the upper bound | 11 < 

sup ; |1 — 7 /.Ay | where (Ay) is the sequence of eigenvalues of T m . Since for all j > 1 we have 
0 < c m < A j < C, there is a rank no such that for all n > Hq, j„C < 1. In particular, for 
all n > no we have ||a n || < 1 — 7 n c m . Thus, there is a positive constant Ci such that for all 
n > 1 : 

\\fn-l\\ < Cl exp ^ Amin L 7 fcj < Cl exp ^-c m Eat). ( 2 i) 

where A min > 0 is the smallest eigenvalue of T m and \\f n-i || denotes the spectral norm of 
operator f> n -\. Similarly, there is a positive constant <7 such that for all integer n and for all 
integer k <n — 1 : 


n —1 


fn-lh 1 < C2 exp -Cm ^ 7 j 

V j=k +1 / 

Moreover, for all n > no and k > no such that k < n — 1, 


( 22 ) 


n —1 


1 < exp -Cm J2 7 ; / 

V ;=W1 / 


(23) 
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see Cardot et ajJ ({2013 ') for more details. Using decomposition (H5l) again, we get 

E [\\Z n -m\\ 2 ] < 3E [\\p n -i(Zi -m)\\ 2 ] +3E [\\p n -iM n \| 2 ] +3E [||j3 B _ 1 R n || 2 ] . (24) 

We now bound each term at the right-hand side of previous inequality. 

Step 1: The quasi-deterministic term: Using inequality (I2TT) . with help of an integral test for 
convergence, for all n > 1: 

E [||j8„_i(Zi - m) || 2 ] < cf exp (-2 c m ^7kj E [ll Z i “ m \\ 2 ] 


< c\ ^— 2 c m c 7 J t ^dt^j E [||Zi — m\\' 


< qMexp 2 


CmC'y \ ( C m Cj -j_—a 

. - L exp —2-- -n 

1 — a ) 1 — IX 


Since a < 1, this term converges exponentially fast to 0. 


Step 2: The martingale term: We have 


H/wvy 2 = 


n —1 


E 'Ifkfin-lfik £k+1 


fc=l 

n—1 2 ft—1 


~ E 7k fin-lfik £k +1 + 2 E 'Yk'Yk'iftn-lPk £k+lr fin-1 ft£k’+ 1) 

fc=l fc=l k'<k 

<L7 2 k Pn-lfc 1 2 ||^l| 2 +2EETm'(^-l^ 1 W^n-l^ 1 ^ + l)- 
fc=l fc=l k'<k 

Since (£„) is a sequence of martingale differences, for all k' < k we have 


E [<ft +1 ,fr +1 )] = E [E [<& +1/ ft, +1 )|J]t]] = E [(E [& +1 |J‘k] / ^ +1 )] = 0. 


Thus, 


n —1 


E[||^_iM„|| 2 ] < £ 7 ? Pn-ifr 1 

Jfc=l 


(25) 


because for all k G N, E [||^ + i|| 2 ] < 1. The term |/3„-id/. 1 1| converges exponentially fast 
to 0 when k is small enough compared to n. We denote by E(.) the integer function and we 
isolate the term which gives the rate of convergence. Let us split the sum into two parts: 


n —1 


k=l 




2 


E(n/2)—1 

E 'rlWPn-i&W 2 


k=l 


E 'YkWfin—lfik 1 1| 2 - 

k=E{n/2) 


(26) 


We shall show that the first term on the right-hand side on (l26l) converges exponentially fast 
to 0 and that the second term on the right-hand side converges at the rate 7 .. Indeed, we 
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deduce from inequality (l22l) : 


E(n/2)—l 2 E(n/2)-l 

fc=l k =1 

E(n/2)—1 

< c 2 E 7k e " 


2 -2c 2 £7 
iL m 2 n a 


fc=i 


< C 2 C 


E(n/2)—1 

' 1 "‘ E 7, 2 . 

k= 1 


Since £ 7 jr < oo, we get 


E(»/2)-i . 

E 7 2 k\\Pn-iPk 1 \\ 2 = O (e- CmC -r r 

k=i 


We now bound the second term at the right-hand side of d26l ). Using inequality d23l) , for all 
n > 2n 0 : 


n —1 


n—2 


E 7? 0 .- 1 U < E 7 £ - 2e -^.7 +7 Ei 

k=E(n/ 2) k=E(n/ 2) 

^ \ a n—2 


~ Cl \E(n/2) 


E 7fcc‘ 


2c m EL i+1 7 j I W 


n—1 


k=E(n/ 2) 


n—2 




k=E(n/2) 

Moreover, for all « > 2«o and k < n — 2: 

n 1 cn f' r* 

E 7, < L > = ^ 

;=/c+l A +1 S 1 a 


n 1 "*- (fc + 1) 


1 —a 


and hence e 2CmL U+iH < MWi) 1 «]. Since 


n—2 


n—2 j 


E 7^"A (l+1 )'- < 2% E rt+1) , 

k=E(n/2) k=E(n/2) ^ T ' 




rn-1 


< 2 a Cy 


JE{n/ 2) (f+ 1) 


1 ^Cm^h+l) 1 “dt 


, 2 a 2c -^-n 1 -* 
2-Cm 

^ ^ 2c 7 „l-« 
< -g ZC '"l-a” 


Note that the integral test for convergence is valid because there is a rank no > 1 such that 


the function t ^ g 2c,w ^ 


is increasing on [hq, oo). Let «i := max{2no + l,«o}. 
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for all n > ny. 


n —1 

E Tfc 


k=E{n/2) 





1 

Cm tt“ 


+ c 7 2 2a 



Consequently, there is a positive constant C 2 such that for all n > 1, 


3E [||jS H _ 1 M n || 2 ] <C 2 ^. 


(27) 


(28) 


Step 3: The remainder term. In the same way, we split the sum into two parts: 

n— 1 E(n/ 2)—1 ?i—1 

E 7)t/3„-i^ 1 4+ E 'YkPn-i&h- (29) 

1=1 1=1 k=E(n/2) 

It can be checked (see the proof of Lemma 15.31 for more details) that there is a positive 
constant M such that for all n > 1, 


E 


|Z„ — ml 


< M. 


(30) 


Moreover, by Lemma 15.11 almost surely \\S n || < C m ||Z n — m|| 2 . Thus, for all k,k’ > 1, 
applying Cauchy-Schwarz's inequality, 

E[||4||||4'||] <C 2 m E[\\Z k -m\\ 2 \\Z k/ -m\\ 2 ] 

< Cly/El\\Z k -m\\±} x /E[\\Z k ,-m\\*} 

<C 2 supE ||Z„ — m|| 4 

n> 1 ^ 

< cl M. 


As a particular case, we also have E [| (4,4') |] C C 2 M. Applying this result to the term on 
the right-hand side of (l29l ) . 




E(n/2)—1 

2" 


‘E(n/2)-l 

E 


E 7fc£n— 


<c 2 M 

E 7i||/*h-i&' 1 II 



1=1 



k=l 


/E(n/2) — l \ 

< c 2 C 2 Me 2c '"A” M £ 7i J 

< c;e~ 2CmC ^ 1- “n 2 - 2a . 


This term converges exponentially fast to 0. To bound the second term, we use the same 
idea as for the martingale term. Applying previous inequalities for the terms E [||4|| 114' II ] 


15 





















which appear in the double products, we get: 




n—1 

2" 




n—1 

E 


V Ifkfn—lfik 4 


< C 2 sup 

E 

'\\Zk-m\\\ 




k=E(n/2) 


E(n/2)<k<n- 

-1 


_k=E(n/2) 





< C 3 sup 

E 

II z k ~ mf 



E(n/2)<k<n~l L 


since 


^k=E(n/ 2 ) 7k\\Pn-lPi c *1 


i 2 


is bounded. This fact can be checked with similar calculus 


to the ones in the proof of inequality 


□ 


To prove Lemma 13.21 we introduce two technical Lemmas. The first one gives a sharp 
convexity bound when \\Z n — m\\ is not too large. 

Lemma 5.2. If assumptions (Al) and (A2) hold, there are a rank n K and a constant c such that for 
all n > net, \\Z n — m\\ < cn 1 ~ K yields 


( 0 (Z n ),Z„ m) > ^ , J|Z„ m\ 


c^n 1 


(31) 


As a corollary, there is also a deterministic rank n' a such that for all n > n' a , \\Z n — m\\ < cn 1 a 
yields 


\\Zn - m - 7 ,MZ n )\\ 2 < \ l - -J ||Z„ - m|| 2 . (32) 

Proof of Lemma 15.21 We suppose that | Z„ — m|| < cn 1 a . We have to consider two cases. 


If 11Z„ — m|| < 1, we have ||Z n || < ||m|| + 1, so that by Corollary 2.2 in 


Cardot et al 


20131) . there is a positive constant C\ such that (<f>(Z„), Z„ — m) > c\ ||Z„ — m|| 2 . 

If now \\Z n — m || > 1, since T>(Z„) = T m i t(z n - m) (Z» — rn jdt, by continuity and lin¬ 

earity of the inner product, 

(<h(Z„),Z n - m) = J o (z n - m,T m+t{Zn -m)(Z n - m)) dt. 


Cardot et al. 


Moreover, operators T;, are non negative for all h G H. Applying Proposition 2.1 of ( 

J2013Il and since for all t G [o, u z fu 1 we have ||m + t(Z n — m) || < ||m|| + 1, there is a pos¬ 
itive constant C 2 such that: 


( 0 (Z J; ),Z„ - m) = J lz n - rn,T m+t{Zn _ m) (Z n - m)} 


dt 


>~i: 

>-L 


1 / Z n -m 


Zn m 'T'm+t(Z„-m)(Z n Ul)) dt 


l/\\Z„—m\\ 


C2 


I Z„ — m I 


C 2 1| Z H — m\\ 2 dt 


\Z n - ml 


\ C 2 II ry | I 2 

> —^—I \Z„ — m . 


cn 


l—a I 
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We can choose a rank n a such that for all n > n a we have C\ > J , t which concludes the 
proof of inequality (I3T1) with c = C 2 C 7 . 


We now prove inequality 1(32). For all n > n a , ||Z„ — m\\ < err a yields 


|Z„ - m - 7„0(Z„)|| 2 = ||Z„ - m|| 2 - 27„(0(Z„),Z„ - m) + 7 2 ||<E>(Z n ) 

2 c 0 


< \\Z n — m\\ — 


n a 


| Z n - m|| 2 + 7 „C 2 ||Z f! - m 


= (1 --+ c 2 4 - 

\ n n 2ix 


Z n - m\ 


Consequently, we can choose a rank n' a > n a such that for all n > n' a we have C 2 c 2 n 2(1 
n 1 . Note that this is possible since a. > 1/2. 

Lemma 5.3. There is a positive constant C a such that for all n > 1, 


P 


|Z„ — m|| > cn 1 - 


_Q 

— w 4 ~ 


where constant c has been defined in Lemma \T2\ 

Proof of Lemma 1531 In order to use Markov's inequality, we prove by induction that for all 
integer p > 1, there is a positive constant M p such that for all n: 


E [||Z n — m\\ 2p ] < Mp. 


Cardot et al 


(120131) have proved previous inequality for the particular case p = 1. Decom¬ 


position (|8) yields 


Z n+ i ~ m|| 2 - ||Z„ - m|| 2 + 7 2 ||0(Z„)|| 2 + 7 ? i ||£„+i || 2 

~ 27n(Z„ - tn , < h(Z H )) — 27 n (£ B _|_i,<E>(Z n )) + 27u(^n+i,Z n m). 


Moreover, <£ l+1 , <D(Z n )> = - (U n+1/ <P(Z n )) + ||0(Z„)|| 2 . Since ||0(Z„)|| < 1, ||£ B+1 || < 2 
and (<E>(Z„), Z n — m) > 0, applying Cauchy-Schwarz's inequality, for all n > 1 

||Z„+i - m11 2 < ||Z„ - m\\ 2 + 67 2 + 27 n (£ n+ i,Z„ - m). (33) 


Using previous inequality, 

||Z n+ i - mfP < (||Z„ - mf + 6 7 2 + 2 7 n (^ + 1 / Z n - m - 7 f; 0(Z„))) p 
= E (jj. 7 ) ( 2 7n(^n+i,Z„ - m))* (||Z„ - m || 2 + 67 2 ) p_/ ' 

= (||Z„-m || 2 + 67 2 ) p + 2p7„(^ + 1 ,Z H -m) (||Z„-m || 2 + 67 2 ) p 1 (34) 

+ E (Q (27«(^n+i/Z„ - m)f (||Z„ - m || 2 + 67 2 ) P ^. 
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We now bound the three terms in d34l ). First, using induction assumptions. 


E 


2 \P 


\Z„ - m\\ 2 +6yl) 


= E 


p-i 


\ z n - m\\ 2 P + £ (JJ IIZ„ - m\\ 2k (6 7 ly- k 
= E [ || Z„ - mfP] + £ Qe [||Z„ - m|p] (6 7 2 ) p “* 
< E [||Z n - mfP] + £ (^M fc (6 7 ^)P- fc . 


Since for all k < p — 1 we have ( 7 2 ) p k = 0 ( 7 2 ), there is a positive constant C /; such that 
for all n > 1, 


E 


|Z„ - m|| 2 + 6 7 2 ) p < E [||Z„ - ra|| 2p ] + C p7 2 . 


(35) 


Remark that Cp does not depend on n. Let us now deal with the second term in (l34l) . Since 
(£„) is a sequence of martingale differences adapted to the filtration {T n ) and since Z„ is 
Tn -measurable, for all n > 1, 


E 


2 7 „(^„+i,Z„ - m) (\\Z n - m\\ 2 + 6 7 2 ) f ' 1 1 T n 


= 0 . 


It remains to bound the last term in (l34l) . Applying Cauchy Schwarz's inequality, we get, 
for all n > 1, 

£ f^E [(27 n (tn + x,Z n -m)) k {\\Z n -m \\ 2 + 6 7 2 n y~ k ' 


k =2 


p p —k 


p—k 

(2 7n (£„ + i,Z„ - m)f Yj 

i =0 


p — k 


\Z„ - mf)"-*-'(67,y 


E E ( P • *) (Q2‘+'3< 7 : +2 'E [(& +1 ,Z„ - m» l ||Z„ - 


fc=2 /=0 

p p —k 


sEEfdm^WA'Eiii^.miz. 


P =2 7=0 \ J J W 
Since ||£ n+ i || < 2, we have for all n > 1, 

p p—A: 


— m 


\2p-k-2j 


EE ( p j k ) (Q2^ 7 ; +2 %[i| f „ +1 ||‘iiz„ 


/c =2 ;=0 


— m 


|2p—fc—2/ 


< EE ' p 

ic =2 7=0 


7 


2 2fc+ ^ 7 f ! +2; E [||Z„ — m|| 2p_fc_2; ' 
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Finally, using Cauchy-Schwarz's inequality and by induction, we get 



k=2k=2j=0 \ 


k ) (^) 22 ' C+i3/ 7» +2;iE 



— m 


\2p-k-2f 


< EE ( P / k ) (^)2 2 ‘ + '3'7b 2 VE[ll z »-»>ll 2 (^-')] 

< e e ( p 7 *) © 


^/e [||Z„-m|| 2 (P-M+i)] 


Moreover, for all k > 2 and / > 0, k 
n > 1: 


E 


fc =2 





Off), so there is a constant C', such that for all 


m ) f (IIZ„-m|| 2 + 67^) p k 



2 

n 


Remark that C' p does not depend on n. Since Z\ is bounded by construction, we get by 
induction 


E [||Z B+1 - mfP] < E [||Z H - mfP] + (c p + C p ) 7 \ 

<B[\\Z 1 -m\\ 2 r] + (c p + C' p )'jr i 'Yk 

V ' k =1 

/ \ 00 

<E[|| Z 1 -m\\ 2 P] + (c p + C , p )^ l 'ri 

fc =1 

< M p , 


which concludes the induction. 

Applying Markov's inequality, for all integer p > 1: 


P 



> CM 


1 — 


E [\\Zn-mfv} < M v 
(cn 1_n: ) 2 P — (cn 1 ~ a ) 2 P 


The announced result is obtained by considering p > 2 (i- a ) ■ Note that this is possible since 

ufl. □ 

Proof of Lemma 1.3.21 For all n > 1, 


E 



= E 


Z n+1 -m\\% 


Z n -m\\>cn x “ 


+ E 


Z n+ i-m || 4 li 


Z n —mllKcn 1 x 


, (36) 


where constant c has been defined in Lemma 15.21 Let us bound the first term at the right- 
hand side of (l36l) . Since ||Z n+ i — m\\ < \\Z n — m\\ + y n < \\Z\ — m\\ + Ejt=i Ik and since Z\ 
is bounded, there is a constant C' a such that for all integer n > 1, 


I Z n — m I 


<cy 
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Consequently, 


E 


Z„+i - m || 4 1, 


Z„—mH^cn 1 “ 


< E 


(c«(n + l) 1 - a ) 4 l| |ZfI -m ll >cn 1 -“ : 


< (C'(n + 1) 


1 —a 


p 


|Z„ — mil > cn 


1 — Oi 


Thus, applying Lemma 1531 we get 


C> + 1 ) 


1 —a. 


E 


I Z n — m II > cn 


1 —a. 


< 


CQ(n + 1) 


4—4a 


-.4—a 


< 2 


C'X„ 


4—4a v -a 


,4—4a 


n 


4—a 


< 2 


Cl 4 C a 


4—4a v -a '-a 


H 


3a 


We now bound the second term. Suppose that \\Z n — m\\ < cn 1 a . Since ||£„+i|| < 2, using 
Lemma 15721 there is a rank n a such that for all n > n a , 

||Z„ +1 -m|| 2 l|| Zn „ m || <cnl - a 

= (l|Z »i — ni — 7 n<h(Z n )|| + 7 n ||£ n +i|| + 2 r y n {Z n — m — 7 , ; <I>(Z n ), £ n +i)) l{||z„—mllccn 1 - 11 } 
< H Z " “ + 4 7?7 + 27»i(Z„ - m - 7 „<b(Z n ),£ n+1 )') l { || Z „- m ||< C ni-q- 

Moreover, since (£ n +i) is a sequence of martingale differences for the filtration (J 7 ,, ), 


E 


E (Z n HZ 7«^ > (Zn), ^n+1)il||Z„_mII<cni- a I 

(Z« — m ~ 7h^ > (Z«)/£ rc+l) ||Z n — ^II il||Z„—m||<cn 1_K 1‘^'n 


= 0 , 

= 0 . 


Applying Cauchy-Schwarz's inequality. 


E 


Z„+i - hz|| 4 1i 


Z n -m\\<cn x “ 


< (1- E 


|Z„ — m 11 4 1, 


Zn-mW^cn 1 


+ 167« 


+ 8 7„ ( 1 — — ) E 


|Zji m \\ 1 ^-\\Z n -m\\<cn 1 -’ i 


+ 47 2 E (Z„ - hi - 7„0(Z n ),^ +1 ) 2 l|| Zn _ m ||< cn i- Il 


< | 1 - E 


|z„ - nil 


14 + 16 7 4 + 8 7 2 E [||Z„ — ?n|| 2 ] 

+ 4 7 2 E ||Z„ - hi - 7 ) 3 0(Z„)|| 2 E [||^+i || 2 | J~n\ ^\\Z„-m\\<cn l * 
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Finally, since E [||£ h +i j 2 | J 7 ,,] < 1, we get with Lemma [53l 


E 


\\Zn+l ~ m l| 4 l||Z„-m||<cn 1 -‘' 


< 1 - 


E 


||Z„ - m|| 4 + 16 7 4 +8 7 2 E [\\Z n — m\ 


+ 4 7 „ ( 1 — ) E [||Z„ — m\ 


< ( 1-) E 

n 


\\Z n ~ m || 4 J + 16 7 4 + 12 7 2 E [|| Z„ - m\\ 2 ] 
Since 7 4 = o (^), there are two positive constants C[ and C ^ such that for all n > n a , 


E 


||Z„ + i - ™\ 


= E 


l|Z„+l - ^l| 4 -^||Z„ — m||>cn 1_ 


+ E 


IZ„ + i-m|| l||z„ —m||< ch 1_ “ 


< 


24—4a£/4£ 


a '-a 


i3a 


1 - - ) E 

n 


||Z„ - m|| 4 + 16 7 4 + 12 7 2 E [||Z n — m\ 


< | 1 - ^ E 


||Z„-m|| 4 l+Ci-^ + C^E[||Z n -m|| 2 l 


□ 


Proof of Theorem [3771 Let /3 6 (a, 3a — 1). Let us check that there is a rank np > n a (n a has 
been defined in Lemma l3T2b such that for all n > n^, we have 


1 - - 
n 


n +1 


(C[ + C£) 2 3a 


(n + I) 3 *”/ 5 

(C[, Cj are defined in Lemma 13721) . Indeed, since f < 3a — 1 < 2, 


< 1 , 




_l_ 


= 1--+0 



1 + - + o 

H 



+ 0 


= 1 —(2 —j8 )-+o - 
n V n 

We now prove by induction that there are two positive constants C r and C" such that 2C' > 
C" > C' > 1 and such that for all n > np, 

E rilZ„ — mil 2 ] < — 


Let us choose C' > npE 


E 


I Z np - m || 2 


\\Zn-m\\ A 


< 


n 

Cf 

nP' 


and C" > np E ||Z n/ , — m|| 4 . This is possible since 


there is a positive constant M such that for all n > 1, E [||Z n — m11 2 ] < M and E 


I Z„ — m | 


< 
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M. Let n > np, using Lemma 13721 and by induction. 


E 


\\Z n+ i - m\ 


< ( 1-J E || Z n — m || 

< n 1 V C " i i C ' 2C ' 
n J nfi n^ a n 3a 


a a 


+ ^ + ^l\\Z n -m\ 


Moreover, since C' < C" and since C" > 1, 


E 


\\ z n+i-m\\ 4 


< 1 - 


1\ 2 C" C'C" C' 2 C" 

1+2 


n J nfi n 3a 


n 


3a 


Factorizing by we get 


E 


||Z n+ i - m\ 


< 11 -- 
n 


< ( 1 -- 
n 


< 1 - 


n +1 


71 + 1 


^ C" 


71 + 1 


3a 


C' 


71 + 1 


(n + 1)/ 3 + ( C l + C 2) ( n J ( n + l)3a-/3 ( n + i)/3 

Z~7/ -1 Z~>// 

u _ l lr' _i_ p /s \ 93a_ c 

(71 + l)/ 3 11 2> (71 + l) 3a_ ^ (71 + 1)0 

]8 

+ (CJ + C£) 2 3a 


C' 


(ll + 1) 3 *-/ 3 / (11 + 1 )+ 


By definition of n^. 


E 


\\Z n+ i-m \\ 4 


< 


C" 


(n + l)^ 


(37) 


We now prove that E [||Z„ + i — 7ii|| 2 ] < ufiy ■ Since C" < 2C, by Lemma [+TI and by 
induction, there is a constant C" > 0 such that 


E \\\Z n+1 -m\\ 2 ] < 


Qftf 

——— + C 3 sup E 

\ n ^ 1 ) n/2+l<k<n+l 


+2^C; C 


< 


(ll + l) a 

c’" 

(ll + 1)" 


+ 2^ +1 C 3 


(n + 1)/ 3 

1 


|Z t - ml 


c 


(„ + l)/5-a („ 


To get E [||Z„+i - 711 


< ( n +iy > we only need to take C' > C'" + 2^ +1 C 3 ( n+ \y-« , which 

concludes the induction. 

The proof is complete for all n > 1 by taking C 7 > max„< n? {n a E [||Z n — m|| 2 ] } and 
C" > max n <„ p {n^E [||Z n — m|| 4 ] }. □ 


Proof of Proposition 13.71 A lower bound for Z n — m — <t>(Z„) || is obtained by using decom- 


22 





































position ©. Using Corollary l2.il for all h E H, 


||4>(m + h) || < 


[ r m +th(}l)dt 

JO 


< / \\T m+th (h)\\dt 
Jo 


<C\\h\\. 

Consequently, there is a rank «o such that for all n > no, 


||h - Ofn^im + h)\\ > \ \\h\\ - 7„||0(m 

>||fc||-C7„||fc||. 


In particular, for all n > no, 

|| Z n - m - 7 „ 0 (Z„)|| > (1 - Cj n ) \\Z„ ~ rn\\. 

Since E [||Z„ — m11 2 ] =0, there is a rank n' 0 such that for all n > n' 0 , 

E [||^ + i|| 2 ] = 1 - E [||0(Z„)|| 2 ] > 1 - C 2 E [||Z„ - m|| 2 ] . 

Finally, since (^ n +i) is a sequence of martingale differences adapted to the filtration {JF n ), 
there is a rank n\ > n' 0 such that for all n > n\, 

E [||Z n+1 - m|| 2 ] > (1 - C 7n ) 2 E [||Z M - m|| 2 ] + 7 2 (l - 2C 2 E [||Z„ - m|| 2 ]) 

> (1 — 2Cy n ) E [||Z„ — m|| 2 ] + 7 2 . 

We can prove by induction that there is a positive constant Co such that for all n > n\, 

E [||Z„ - mf] > 

To conclude the proof, we just have to consider C := min {mini^,,^ {E [||Z„ — m|| 2 ] n a },Co}. 

□ 


5.2 Proofs of the results given in Section I47T1 


Proof of Proposition \4.1\ As in Pinelis (1994), let us define, for all integers j and n such that 
2 < j < n, 


i -1 

fj,n •— Xj lfkfn—l,k£k+lr 
k=l 


dj,n ■— fj,n fj—l,n — 


&j,n • 


:= E 


' e WdjA-i-\\d jin \\\F hl 
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with fo, n = 0. Remark that for all k < n — 1, 


E [fin—l,k£k+l \J~k\ ~ 0. 


It is not possible to apply directly Theorem 3.1 of Pinelis 1 994 ) because the ^sequence 
(f$ n -i,k£k-i) is n °t properly a martingale differences sequence. As in Pinelis 19941') . for all 
t E [0,1], let us define u{t) := \\x + tv ||, with x,v E H. We have for all t E [0,1], u'(t ) < ||z?|| 
and (u 2 (t)) < 2||u|| 2 . Moreover, since for all u E IR, cosh u > sinh u, we also get 

(cosh u)" (t) < ||y|| 2 cosh«. 

Let cp(t) : = E [cosh + td jr „ ||) |F y ], 

cp f, (t) < E [||d;>|| 2 cosh (\\fj-i, n + td jr „ ||) \Tj-l] 

< E ||d y >||V l|d wll cosh (||/ y -i,„||) \Fj-i 

Moreover, since (£„) is a sequence of martingale differences adapted to the filtration (J 7 ,,), 
for all j > 1, E [dj iH \ Jy-i] = 0 and cp'( 0) = 0. We get for all j > 1 such that j < n, 

E [cosh (||/;>||) l^-i] = <p{ 1) 

= <p{0 ) + / (1 - t)cp"(t)dt 
Jo 


Let Gi := 1 and for all 2 < j < n, let G y := 
is JFj -measurable. 


E [Gj + i\JFj] = E 


< (1 + e ; >) cosh (||/;-i,n||) • 

cosh( m||) tt • • • i*. 

7 •— —]— ' . Using previous inequality, since ey + i /W 

cosh (H/jf'+i,)!||) 




< 


n£z ( 1 + e i.n 
E [cosh (||// + i,«||) I Fj\ 

nS(i+^) 

(l + e j+l,n) cosh (\\fj, n ||) 


n£J(i+g i# „) 


= Gy- 
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By induction, E [G„] < E [Gi] < 1. Finally, 


IP [||/n,n || > r] < P 


G n > 


coshr 


< P 


G n > l 


n;u(i +e jin ) 

exp (r) 


n;ua+u* 


< 2E [G,j] e 


■' n( 1 +£ i.»: 

i-2 


< 2 c ' n(l + ^>) 

7=2 


□ 

Proof of Theorem I47TI Using Theorem 13.1 [ one can check that E [||/3„ _i R„ ||] = O (jf)- Indeed, 
applying Lemma l5dl 


E[||j8 B _iR„||]< E^H^-I^IIEIII^II] 

fc=i 

< Cm 1 ||E [||Zfc — m|| 2 ] . 

k=l 

Moreover, with calculus similar to the ones for the upper bound of the martingale term in 
the proof of Proposition B.ll and applying Proposition l3.ll 

CmE'TkWPn-ltfm [l|2/c-/n|| 2 ] < CmC’c^-^Wfn-l^W 

k=l k=1 K 



Finally, the term /3„_i (fZ\ — m ) converges exponentially to 0. So, there are positive constants 
Ci, C[, C 2 such that 


P [||Z„ — m|| > t] < P 


Wfin-lMn 


> 


+ 


Cie~ c i nl “ C 2 1 


f 2 


+ 


n“ f 


(38) 


We now give a "good" choice of sequences (N n ) and (erf) to apply Corollary 14.21 
Step 1: Choice of N„. 

Using inequality d22]l and since ||£„ + i|| < 2, we have II/^-ijSjT^+i || < 2c2jk e Amin '^ i=k+1 ' r ’ if 
k n — 1, where A mm is the smallest eigenvalue of T m . With calculus analogous to the ones 
for the bound of the martingale term in the proof of Proposition l3.il one can check that if 
k < n/2, 


1 7kfk+1 


< 2c 2 e 2Am ' nC 7 nl 11 y 1 . 
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Moreover, if k > n/2 and k n — 1, 

2 c 27 jte _AmtaE H t + 1 ' )7 < 2 c 2 7 , c < 2c 2 2 a c 7 —. 

n a 

Finally, if k = n — 1, 


—1 A; Tn—l£n 


< c 7 



Let Cat := max |sup n>1 je ^t” 1 ,2c2, l|, thus for all n > 1, 

su Pk<n-i{\\Pn-iPlirk£k+i\\} < %:■ So we take 


Step 2: Choice of a 2 . 

In the same way, for n large enough, we have 


n —1 

X> 


k=1 


Pn-lf> k 1 7k^k +1 



2“ +1 c 7 1 
c m n 01 ' 


Indeed, we can split the sum into two parts, the first one converges exponentially fast to 0, 
and is smaller than the second one from a certain rank. For n large enough, we can take 


ot = 


'2f+a y 
: 7 -. 

c m n a 


Using inequality d38l) and Corollary 14.21 


(39) 


F [||Z„ — m\\ > t] < 2exp ( — 


{t/2f 


2( cr i? + N n (t/2)/3) 


C\e 


-CJb 1 -” 


t 2 


+ =■ 


We look for values of t for which f{t, n) < 5. We search to solve: 


2 exp 


(f/2) 2 \ 

2{a 2 + N n t/6)J 

Qe-V” 1 -* 

t 2 

C2I 

n a t 


<5/2, 

<5/4, 
< 5/4. 
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We get (see Ta rres and Yao (2014), Appendix A, for the exponential term): 


t > 4 ( -y- + cr„'j ky. 


t > 2 l 


1 C x e~ c > x - 


t > 4 


C 2 I 

l h* 5' 


Let us take a rank ns such that for all n > ns, with 




Thus, for all n > nj, with probability at least 1 - 3: 


\Z n — m|| < 4 ( y + In 


□ 


5.3 Proof of Theorem l4.2l 

SinceE [||Z„ — »z|| 2 ] < jjs, applying Cauchy-Schwarz's inequality, we have E [|| Z n — m\\] < 


These bounds are useful to prove that the first terms in equation (ITTll are negligible. 
Indeed, 


E 


7n+l 

2' 

?Wn 



n 2a 

< - 2 

c 7 n z 


E f \\Z n+ \ - m 


n 


2k 


C 


c 7 n 2 (n + 1 )" 
„ 2 a C' 1 


c 7 n 


2—a ' 


Since a. < 1, we have that Moreover, since 0 < 7 ^ — j k 1 < ac 7 1 k a 1 , there is a 

positive constant C\ such that: 


E 


" fc =2 


< 


occ . 


E E [mii]k 


.a—1 


/c=2 


< 


/vr ^ \/( v n/2—1 

7 V ^ V -1 i.a/2—1 


E 

k=2 


< 


Cl 


H 


1—a/2" 
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Note also that since a < 1, we have 1 — a/2 >1/2. Moreover, since \\S n || < C m \\T n \\ 2 , there 
is a positive constant C 2 such that 


E 


1 " 
-E4 

n tl 


<^rt E [\\T k \\ 2 

n k =1 


< 


/"’/ It 


X> 

*:=i 


< c 2 —. 


Ti 


7i” 


< C3. 

- tt 


Finally, there is a positive constant C 3 such that E 

We now study the martingale term. Let M be a constant and (cr„) be a sequence of 
positive real numbers defined by: 


M:= 2 > sup ll^ll, 

i 

ti:=n> EE[||^|| 2 |^- 1 ], 

k =1 


Applying Pinelis-Bernstein's Lemma, we have for all t > 0, 


sup 

M-k+i 

> t ] < 2 exp 

\l<k<n 


/ L 


2 (u-2 + Mt/3) 


Consequently, 


P 


M„ 


+1 


> t ] < P sup 

V l<k<n 


M 


k +1 


> fft 


< 2 exp 

< 2 exp 

< 2 exp 


f 2 n 2 


2 (u 2 + Mtn/3) 


2 ( cr 2 /n 2 + Mt/3n) 
t 2 

'2K2 + N'f/3). ' 


with n ' 2 := n 1 and N' := 2n 1 . As in the proof of Theorem 14.11 there are three positive 
constants C' v C' 2 and C' 3 such that for all t > 0 , 


P [||r m (Z„ - m) || > t\ < 2exp 


(t/2) 2 

2 (ctf+ N'f/ 6 ) 


q c' q , , 


2 ^ 1—&/2 


We search values of t such that g(t,n) < <5. We have to solve the following system of 
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inequalities. 


2 exp 


(t/2f 


2 (cq' 2 + N„t/6 ) J 
C 


f n i-a/2 


< <S/2 

< 5/6 


C 

^< 5 /6 

fn a _ 


r' 

fn 


< <5/6. 


We get (see Tarres and Yao (2014), Appendix A, for the martingale term): 


N' 


t > 4 —^- + < 7 ^ In - 


t > 
t > 
t > 


6 C[ 


5 n 1_a/2 


6C2 1 
<5 n a 
6qi 
<5 n 


Since 


n: 


r+^' = 


3 n 


Vn 


, the other terms are negligible for n large enough and we can 


consider a rank ns as in (ITSll . 
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